Trabectedin derails transcription-coupled nucleotide excision repair to induce DNA breaks in highly transcribed genes

Most genotoxic anticancer agents fail in tumors with intact DNA repair. Therefore, trabectedin, anagent more toxic to cells with active DNA repair, specifically transcription-coupled nucleotide excision repair (TC-NER), provides therapeutic opportunities. To unlock the potential of trabectedin and inform its application in precision oncology, an understanding of the mechanism of the drug’s TC-NER-dependent toxicity is needed. Here, we determine that abortive TC-NER of trabectedin-DNA adducts forms persistent single-strand breaks (SSBs) as the adducts block the second of the two sequential NER incisions. We map the 3’-hydroxyl groups of SSBs originating from the first NER incision at trabectedin lesions, recording TC-NER on a genome-wide scale. Trabectedin-induced SSBs primarily occur in transcribed strands of active genes and peak near transcription start sites. Frequent SSBs are also found outside gene bodies, connecting TC-NER to divergent transcription from promoters. This work advances the use of trabectedin for precision oncology and for studying TC-NER and transcription.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g.means) or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g.Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection Microscopy images were acquired using a fluorescence microscope (BX53, Olympus).
Western blot images were acquired using an automated imaging system (ChemiDoc touch imaging system, Bio-Rad Laboratories).DNA libraries were sequenced on an Illumina NovaSeq 6000 with a single-read protocol and the read length of 101 bp (R1).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers.We strongly encourage code deposition in a community repository (e.g.GitHub).See the Nature Portfolio guidelines for submitting code & software for further information.Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Sample size determination was not based on any statistical method.For the various experimental methods, we chose sample sizes based on the technical challenges and the throughput of each assay.These selected sizes align with those in earlier studies.
For every comet chip assay, a minimum of 50 comets for each condition were evaluated to reach a conclusion (Nucleic Acids Res 48, e13 (2020)).For GLOE-Seq/TRABI-Seq experiments, the sample sizes were determined on the basis of the maximal throughput that can be achieved by an experimenter.Specifically, maximally 6 samples could be processed in parallel in the 10-day-long DNA-library-preparation protocol.This resulted in 6 samples for each of the following cell lines: U2OS WT (2x 50 nM drug, 2x 20 nM, 2x DMSO), U2OS XPC-KO (3x 50 nM drug, 3x DMSO), U2OS XPA-KO (3x 50 nM drug, 3x DMSO) and HAP1 WT (3x 50 nM drug, 3x DMSO), as wells as 5 samples for U2OS CSB-KO (2x 50 nM drug, 2x 20 nM, 1x DMSO).Three cell lines (U2OS WT, U2OS XPC-KO and HAP1 WT) are TC-NER-proficient and two cells lines (U2OS XPA-KO and U2OS CSB-KO) are TC-NER-deficient, which provides additional power for characterizing TC-NER involvement in the drug toxicity mechanism.
Data exclusions Immunoblots with low signal-to-noise ratios that could not be quantified were excluded.
Preliminary data used to optimize the conditions for Comet Chip assays for each cell line were excluded.

nature portfolio | reporting summary
April 2023 For GLOE-Seq/TRABI-Seq data analysis, reads with low quality and Illumina adapter read-through were removed.The criteria for this exclusion and the counts of removed reads are provided in Methods and Supplementary Figure 8.

Replication
The number of repeated experiments is detailed in the manuscript's figure legends, in figures and Methods.All replication attempts were successful.
Randomization Whole field of comet chip's microwells (96-well) were captured in fluorescence image analyses.All images captured from the Comet Chip assays were used and analyzed by the Comet analysis software from Trevigen (catalog # 4260-000-CS), negating the need for randomization in our study.For GLOE-Seq/TRABI-Seq experiments, cells that initiated a culture to be exposed to the drug and cells that initiated a culture to be exposed to the vehicle were split from a parent isogenic culture, therefore the cells were randomly distributed.

Blinding
There was no subjective allocation for any experiments, thus blinding was not required for this study.
Reporting for specific materials, systems and methods We require information authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.
Data Policy information about availability of dataAll manuscripts must include a data availability statement.This statement should provide the following information, where applicable:-Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policyThe raw sequencing data and processed sequencing data (tsv-files with called DNA breaks) generated in this study have been deposited in the NCBI Gene Expression Omnibus (GEO) under accession code GSE245883 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE245883].Source data, including the western blots in Supplementary Figs.2a and 3a, are provided with this paper.In the data analysis, we used the following external publicly available datasets.Human reference genome, GRCh38 [https://genome-idx.s3.amazonaws.com/bt/GRCh38_noalt_as.zip](pre-built bowtie2 index).Transcript coordinates: GENCODE/V41/knownGene, retrieved from UCSC Table Browser.Canonical transcripts of genes: GENCODE/V41/knownCanonical, retrieved from UCSC Table Browser.Gene expression: DepMap Public 22Q2 [https://depmap.org/portal/download/all/? releasename=DepMap+Public+22Q2&filename=CCLE_expression_full.csv], the cell-line accession numbers ACH-000364 (U2OS WT) and ACH-002475 (HAP1 WT).Protein-coding genes: GENCODE/V41/knownToNextProt, retrieved from UCSC Table Browser.Coordinates of centromeres and gaps: retrieved from UCSC Table Browser for GRCh38.Chromatin accessibility and histone modification: from GEO [https://www.ncbi.nlm.nih.gov/geo/query/acc.Policy information about studies with human participants or human data.See also policy information about sex, gender (identity/presentation), and sexual orientation and race, ethnicity and racism.
If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.